8 research outputs found

    Extending a geo-catalogue with matching capabilities

    Get PDF
    To achieve semantic interoperability, geo-spatial applications need to be equipped with tools able to understand user terminology that is typically different from the one enforced by standards. In this paper we summarize our experience in providing a semantic extension to the geo-catalogue of the Autonomous Province of Trento (PAT) in Italy. The semantic extension is based on the adoption of the S-Match semantic matching tool and on the use of a specifically designed faceted ontology codifying domain specific knowledge. We also briefly report our experience in the integration of the ontology with the geo-spatial ontology GeoWordNet

    ScienceTreks: an Autonomous Digital Library System

    Get PDF
    Purpose of this paper - The support for automation of the annotation process of large corpora of digital content. Design/methodology/approach - In this paper we first present and discuss an information extraction pipeline from digital document acquisition to information extraction, processing and management. An overall architecture that support such extraction pipeline is detailed and discussed. Findings - The proposed pipeline is implemented in a working prototype of an Autonomous Digital Library system – the ScienceTreks system – that: (1) support a broad range of methods for documents acquisition; (2) does not rely on any externalb information sources and is solely based on the existing information in the document itself and in the overall set in a given digital archive; (3) provides API to support easy integration of external systems and tools in the existing “pipeline”. Practical implications - The proposed Autonomous Digital Library system can be used in automating end-to-end information retrieval and processing, supporting the control and elimination of error-prone human intervention in the process. Originality/value - High quality automatic metadata extraction is a crucial step in order to move from linguistic entities to logical entities, relation information and logical relations and therefore to the semantic level of Digital Library usability. This, in turn, creates the opportunity for value-added services within existing and future semantic-enabled Digital Library systems

    Unsupervised Metadata Extraction in Scientific Digital Libraries Using A-Priori Domain-Specific Knowledge

    No full text
    Abstract — Information extraction from unstructured sources is a crucial step in the semantic annotation of content. The challenge is in supporting an high quality automatic approach (or at least semi-automatic) in order to sustain the scalability of the semantic-enabled services of the future. Unsupervised information extraction encompasses a number of underlying research problems, such as natural language processing, heterogeneous sources integration, knowledge representation, and others that are under past and current investigation. In this paper we concentrate on the problem of unsupervised metadata extraction in the Digital Libraries domain. We propose and present a novel approach focusing on the improvement in the metadata extraction quality without involving external information sources (oracles, manually prepared databases, etc), but relying on the information present in the document itself and in its corresponding context. More specifically, we focus on quality improvements of metadata extraction from scientific papers (mainly in computer science domain) collected from various sources over the Internet. Finally, we compare the results of our approach with the state of the art in the domain and discuss future work. I

    On the Binary Sequences with Indistinguishable Signature for a Given Error Multiplicity in Electronic Testing

    Get PDF
    Демиденко, Сергей; Иванюкович, Александр; Махнист, Леонид; Пьюри, Винченцо. О двоичных последовательностях с неразличимой сигнатурой при заданной кратности ошибок в электронном тестированииDistinct binary seąuences (2 ͫ – 1 bits long) may be compressed by an m-bit signature register into the same signature ualue, when a given error multiplicity is considered. Analytical expressions to compute the number of distinct seąuences collapsed into the same signature are presented, by exploiting the properties of the binary Hamming codę theory and of the binomial coefficients

    On the Binary Sequences with Indistinguishable Signature for a Given Error Multiplicity in Electronic Testing

    Get PDF
    Демиденко, Сергей; Иванюкович, Александр; Махнист, Леонид; Пьюри, Винченцо. О двоичных последовательностях с неразличимой сигнатурой при заданной кратности ошибок в электронном тестированииDistinct binary seąuences (2 ͫ – 1 bits long) may be compressed by an m-bit signature register into the same signature ualue, when a given error multiplicity is considered. Analytical expressions to compute the number of distinct seąuences collapsed into the same signature are presented, by exploiting the properties of the binary Hamming codę theory and of the binomial coefficients

    Machine Learning-Based Keywords Extraction for Scientific Literature

    No full text
    Abstract: With the currently growing interest in the Semantic Web, keywords/metadata extraction is coming to play an increasingly important role. Keywords extraction from documents is a complex task in natural languages processing. Ideally this task concerns sophisticated semantic analysis. However, the complexity of the problem makes 1472 current semantic analysis techniques insufficient. Machine learning methods can support the initial phases of keywords extraction and can thus improve the input to further semantic analysis phases. In this paper we propose a machine learning-based keywords extraction for given documents domain, namely scientific literature. More specifically, the least square support vector machine is used as a machine learning method. The proposed method takes the advantages of machine learning techniques and moves the complexity of the task to the process of learning from appropriate samples obtaine
    corecore